No description has been provided for this image
Open In Colab

Basic spatial operations in Geo Dataframes¶

We will review some important operations for geodataframes. As usual, let's do this:

  1. Create a repository named: geodf_operations.
  2. Clone that repo to a local folder in your computer.
  3. In that local folder in your computer, create a folder named maps and data.
  4. Download the shapefile of "Brazil - Subnational Administrative Boundaries"(bra_adm_ibge_2020_SHP.zip) from here and save it in the maps folder (you need to unzip the file).
  5. Download a CSV file with information on the airports in Brazil from this website, Save it in my data folder:
  6. Commit and push.

Let's remember the contents of the world map from last session:

In [ ]:
linkWorldMap="https://github.com/CienciaDeDatosEspacial/intro_geodataframe/raw/main/maps/worldMaps.gpkg"


import geopandas as gpd
from  fiona import listlayers
listlayers(linkWorldMap)
Out[ ]:
['countries', 'rivers', 'cities', 'indicators']

Let's open all the layers

In [ ]:
countries=gpd.read_file(linkWorldMap,layer='countries')
rivers=gpd.read_file(linkWorldMap,layer='rivers')
cities=gpd.read_file(linkWorldMap,layer='cities')
indicators=gpd.read_file(linkWorldMap,layer='indicators')

Now, let's see some important spatial operations:

  1. Subsetting
  2. Projections
  3. Combining GeoDF rows
  4. The convex hull
  5. Spatial Overlays
  6. Checking Validity

Subsetting¶

Filtering¶

You can keep some elements by subsetting by filtering, as we used to do n common pandas data frames.

In [ ]:
countries.head()
Out[ ]:
COUNTRY geometry
0 Aruba (Netherlands) POLYGON ((-69.88223 12.41111, -69.94695 12.436...
1 Antigua and Barbuda MULTIPOLYGON (((-61.73889 17.54055, -61.75195 ...
2 Afghanistan POLYGON ((61.27656 35.60725, 61.29638 35.62853...
3 Algeria POLYGON ((-5.15213 30.18047, -5.13917 30.19236...
4 Azerbaijan MULTIPOLYGON (((46.54037 38.87559, 46.49554 38...
In [ ]:
# as DF
countries.iloc[50:,]
Out[ ]:
COUNTRY geometry
50 Central African Republic POLYGON ((20.45330 4.52379, 20.45798 4.61931, ...
51 Cuba MULTIPOLYGON (((-76.94608 21.45221, -76.88390 ...
52 Cape Verde MULTIPOLYGON (((-25.28139 16.91333, -25.29861 ...
53 Cook Islands (New Zealand) MULTIPOLYGON (((-165.84167 -10.89084, -165.848...
54 Cyprus POLYGON ((33.27229 34.70955, 33.21722 34.69944...
... ... ...
247 South Sudan POLYGON ((34.21807 9.96458, 34.20722 9.90500, ...
248 Indonesia MULTIPOLYGON (((123.21846 -10.80917, 123.19832...
249 East Timor MULTIPOLYGON (((124.41824 -9.30010, 124.40446 ...
250 Curacao (Netherlands) POLYGON ((-68.96556 12.19889, -68.91196 12.181...
251 Bonaire (Netherlands) POLYGON ((-68.19736 12.22264, -68.19292 12.207...

202 rows × 2 columns

In [ ]:
# as DF
countries.loc[50:,'geometry']
Out[ ]:
50     POLYGON ((20.45330 4.52379, 20.45798 4.61931, ...
51     MULTIPOLYGON (((-76.94608 21.45221, -76.88390 ...
52     MULTIPOLYGON (((-25.28139 16.91333, -25.29861 ...
53     MULTIPOLYGON (((-165.84167 -10.89084, -165.848...
54     POLYGON ((33.27229 34.70955, 33.21722 34.69944...
                             ...                        
247    POLYGON ((34.21807 9.96458, 34.20722 9.90500, ...
248    MULTIPOLYGON (((123.21846 -10.80917, 123.19832...
249    MULTIPOLYGON (((124.41824 -9.30010, 124.40446 ...
250    POLYGON ((-68.96556 12.19889, -68.91196 12.181...
251    POLYGON ((-68.19736 12.22264, -68.19292 12.207...
Name: geometry, Length: 202, dtype: geometry

But as a GeDF, you can also filter using a coordinate point via cx. Let me get the bounding box of the map:

In [ ]:
countries.total_bounds
Out[ ]:
array([-180.        ,  -90.        ,  180.        ,   83.62359619])

As you are getting [minx, miny, maxx, maxy] let me select a valid coordinate, i.e. (0,0)

In [ ]:
countries.cx[:0,:0]
Out[ ]:
COUNTRY geometry
9 American Samoa (US) POLYGON ((-170.74390 -14.37556, -170.74942 -14...
10 Argentina MULTIPOLYGON (((-71.01648 -36.47591, -70.98195...
14 Antarctica MULTIPOLYGON (((-45.14528 -60.76611, -45.15639...
24 Bolivia POLYGON ((-62.19884 -20.47139, -62.26945 -20.5...
29 Brazil MULTIPOLYGON (((-70.62862 -9.94849, -70.62889 ...
42 Chile MULTIPOLYGON (((-73.61806 -51.63390, -73.60494...
47 Colombia MULTIPOLYGON (((-81.71306 12.49028, -81.72014 ...
53 Cook Islands (New Zealand) MULTIPOLYGON (((-165.84167 -10.89084, -165.848...
58 Jarvis Island (US) POLYGON ((-160.02115 -0.39806, -160.02811 -0.3...
60 Ecuador MULTIPOLYGON (((-78.70903 -4.58479, -78.72348 ...
71 Fiji MULTIPOLYGON (((180.00000 -16.17274, 179.98621...
72 Falkland Islands (UK) MULTIPOLYGON (((-59.79139 -51.24945, -59.73195...
75 French Polynesia (France) MULTIPOLYGON (((-140.17783 -8.95639, -140.1894...
158 Niue (New Zealand) POLYGON ((-169.89389 -19.14556, -169.93088 -19...
169 New Zealand MULTIPOLYGON (((177.91779 -38.94280, 177.90970...
170 Paraguay POLYGON ((-57.67267 -25.29430, -57.70639 -25.2...
171 Pitcairn Islands (UK) MULTIPOLYGON (((-128.33221 -24.32727, -128.326...
172 Peru POLYGON ((-69.56750 -10.95056, -69.56844 -10.9...
196 St. Helena (UK) POLYGON ((-5.71298 -15.99286, -5.72917 -16.005...
208 South Georgia and the South Sandwich Is (UK) MULTIPOLYGON (((-36.99139 -54.35056, -36.99973...
216 Tokelau (New Zealand) POLYGON ((-171.84805 -9.21889, -171.85886 -9.2...
217 Tonga MULTIPOLYGON (((-173.93921 -18.56889, -173.933...
231 Uruguay POLYGON ((-58.38889 -33.42250, -58.41590 -33.4...
239 Wallis and Futuna (France) MULTIPOLYGON (((-176.16504 -13.35305, -176.169...
242 Western Samoa MULTIPOLYGON (((-172.59650 -13.50911, -172.551...
In [ ]:
#then
countries.cx[:0,:0].plot()
Out[ ]:
<Axes: >
No description has been provided for this image

Notice cx would be cleaner if spatial element is a point.

Let me keep one country:

In [ ]:
brazil=countries[countries.COUNTRY=='Brazil']

Clipping¶

But you can also subset by clipping, as sometimes other data frames may not have the same fields for filtering:

In [ ]:
citiesBrazil_clipped = gpd.clip(gdf=cities,
                          mask=brazil)
riversBrazil_clipped = gpd.clip(gdf=rivers,
                               mask=brazil)

Then, you can plot the clipped version:

In [ ]:
base = brazil.plot(facecolor="greenyellow", edgecolor='black', linewidth=0.4,figsize=(5,5))
citiesBrazil_clipped.plot(marker='+', color='red', markersize=15,
                    ax=base)
riversBrazil_clipped.plot(edgecolor='blue', linewidth=0.5,
                    ax=base)
Out[ ]:
<Axes: >
No description has been provided for this image

The geometry types are not modified:

In [ ]:
set(brazil.geom_type), set(citiesBrazil_clipped.geom_type), set(riversBrazil_clipped.geom_type)
Out[ ]:
({'MultiPolygon'}, {'Point'}, {'LineString', 'MultiLineString'})

Exercise 1¶

  1. Follow the same steps in this last section to plot three maps of one country. Do not use Brazil.
  2. Plot your three layers.

TOC _____________

Map Projection¶

The CRS (Coordinate Reference System) is a very important property of the maps. They affect three some aspects:

  • shape
  • area
  • scale
  • direction

Most maps come with a default CRS: 4326. Pay attention:

In [ ]:
brazil.crs
Out[ ]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In [ ]:
# check units of measurement
brazil.crs.axis_info
Out[ ]:
[Axis(name=Geodetic latitude, abbrev=Lat, direction=north, unit_auth_code=EPSG, unit_code=9122, unit_name=degree),
 Axis(name=Geodetic longitude, abbrev=Lon, direction=east, unit_auth_code=EPSG, unit_code=9122, unit_name=degree)]
In [ ]:
# is this CRS projected?
brazil.crs.is_projected
Out[ ]:
False

Polygons have a centroid. When we try getting a centroid from an unprojected polygon, you get:

In [ ]:
# centroid
brazil.centroid
<ipython-input-16-7f66e1ea2110>:2: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  brazil.centroid
Out[ ]:
29    POINT (-53.09009 -10.77302)
dtype: geometry

Reprojecting¶

A projected CRS will have units in meters or feet (or similar). You can request a crs per country here:

In [ ]:
# recommended for Brazil (meters)
brazil.to_crs(5641).crs.axis_info
Out[ ]:
[Axis(name=Easting, abbrev=X, direction=east, unit_auth_code=EPSG, unit_code=9001, unit_name=metre),
 Axis(name=Northing, abbrev=Y, direction=north, unit_auth_code=EPSG, unit_code=9001, unit_name=metre)]
In [ ]:
# now this works with no warning
brazil.to_crs(5641).centroid
Out[ ]:
29    POINT (3884486.179 8756856.093)
dtype: geometry
In [ ]:
# replotting:

base5641=brazil.to_crs(5641).plot()
brazil.to_crs(5641).centroid.plot(color='red',ax=base5641)
Out[ ]:
<Axes: >
No description has been provided for this image

Let's keep the projected version for all our maps:

In [ ]:
brazil_5641=brazil.to_crs(5641)
cities_brazil_5641=citiesBrazil_clipped.to_crs(brazil_5641.crs)
rivers_brazil_5641=riversBrazil_clipped.to_crs(brazil_5641.crs)
In [ ]:
from google.colab import drive
drive.mount('/content/drive/')
Mounted at /content/drive/
In [ ]:
## saving


import os
maps = '/content/drive/MyDrive/Colab Notebooks/maps'
brazil_5641.to_file(os.path.join(maps,"brazilMaps_5641.gpkg"), layer='country', driver="GPKG")
cities_brazil_5641.to_file(os.path.join(maps,"brazilMaps_5641.gpkg"), layer='cities', driver="GPKG")
rivers_brazil_5641.to_file(os.path.join(maps,"brazilMaps_5641.gpkg"), layer='rivers', driver="GPKG")
brazil_5641.centroid.to_file(os.path.join(maps,"brazilMaps_5641.gpkg"), layer='centroid', driver="GPKG")

Exercise 2¶

  1. Reproject your country's map layers.
  2. Plot the reprojected layers
  3. Save the reprojected layers

Creating Spatial data¶

You will get Lines and Polygons as maps for sure, but that may not be the case with points. Let's open the CSV with information on the airports in Brazil:

In [ ]:
import pandas as pd

data = '/content/drive/MyDrive/Colab Notebooks/data'

infoairports=pd.read_csv(os.path.join(data,"br-airports.csv"))

# some rows

infoairports.iloc[[0,1,2,3,-4,-3,-2,-1],:] #head and tail
Out[ ]:
id ident type name latitude_deg longitude_deg elevation_ft continent country_name iso_country ... municipality scheduled_service gps_code iata_code local_code home_link wikipedia_link keywords score last_updated
0 #meta +id #meta +code #loc +airport +type #loc +airport +name #geo +lat #geo +lon #geo +elevation +ft #region +continent +code #country +name #country +code +iso2 ... #loc +municipality +name #status +scheduled #loc +airport +code +gps #loc +airport +code +iata #loc +airport +code +local #meta +url +airport #meta +url +wikipedia #meta +keywords #meta +score #date +updated
1 5910 SBGR large_airport Guarulhos - Governador André Franco Montoro In... -23.431944 -46.467778 2461 SA Brazil BR ... São Paulo 1 SBGR GRU SP0002 http://www.aeroportoguarulhos.net/ https://en.wikipedia.org/wiki/S%C3%A3o_Paulo-G... Cumbica 1016675 2021-10-28T15:52:55+00:00
2 5906 SBGL large_airport Rio Galeão – Tom Jobim International Airport -22.809999 -43.250557 28 SA Brazil BR ... Rio De Janeiro 1 SBGL GIG RJ0001 https://www.riogaleao.com/passageiros https://en.wikipedia.org/wiki/Rio_de_Janeiro-G... Galeão - Antônio Carlos Jobim International Ai... 51475 2024-05-09T15:04:08+00:00
3 5974 SBSP medium_airport Congonhas Airport -23.627657 -46.654601 2631 SA Brazil BR ... São Paulo 1 SBSP CGH SP0001 http://www.infraero.gov.br/usa/aero_prev_home.... https://en.wikipedia.org/wiki/Congonhas-S%C3%A... http://www.infraero.gov.br/usa/aero_prev_home.... 750 2022-05-03T20:10:35+00:00
7082 309669 SSVR closed Volta Redonda Airport -22.4978 -44.085 1245 SA Brazil BR ... Volta Redonda 0 NaN NaN NaN NaN NaN SSVR, SSVR, QVR 0 2013-09-28T14:52:12+00:00
7083 341727 BR-1429 heliport Santa Helena Heliport -23.59851 -47.441196 2254 SA Brazil BR ... Votorantim 0 SWHE NaN SP0807 NaN NaN NaN 0 2021-03-07T10:30:07+00:00
7084 343017 BR-1493 heliport Bandeiras Centro Empresarial Heliport -23.536615 -47.449475 1827 SA Brazil BR ... Votorantim 0 SWST NaN SP1306 NaN NaN NaN 0 2021-04-14T20:12:01+00:00
7085 509863 SN3D heliport Alphaville Nova Esplanada 2 Heliport -23.558971 -47.473779 2083 SA Brazil BR ... Votorantim 0 SN3D NaN SP1453 NaN NaN NaN 0 2023-06-30T14:01:13+00:00

8 rows × 23 columns

This needs some cleaning:

In [ ]:
# bye first row
infoairports.drop(index=0,inplace=True)
infoairports.reset_index(drop=True, inplace=True)
infoairports.head()
Out[ ]:
id ident type name latitude_deg longitude_deg elevation_ft continent country_name iso_country ... municipality scheduled_service gps_code iata_code local_code home_link wikipedia_link keywords score last_updated
0 5910 SBGR large_airport Guarulhos - Governador André Franco Montoro In... -23.431944 -46.467778 2461 SA Brazil BR ... São Paulo 1 SBGR GRU SP0002 http://www.aeroportoguarulhos.net/ https://en.wikipedia.org/wiki/S%C3%A3o_Paulo-G... Cumbica 1016675 2021-10-28T15:52:55+00:00
1 5906 SBGL large_airport Rio Galeão – Tom Jobim International Airport -22.809999 -43.250557 28 SA Brazil BR ... Rio De Janeiro 1 SBGL GIG RJ0001 https://www.riogaleao.com/passageiros https://en.wikipedia.org/wiki/Rio_de_Janeiro-G... Galeão - Antônio Carlos Jobim International Ai... 51475 2024-05-09T15:04:08+00:00
2 5974 SBSP medium_airport Congonhas Airport -23.627657 -46.654601 2631 SA Brazil BR ... São Paulo 1 SBSP CGH SP0001 http://www.infraero.gov.br/usa/aero_prev_home.... https://en.wikipedia.org/wiki/Congonhas-S%C3%A... http://www.infraero.gov.br/usa/aero_prev_home.... 750 2022-05-03T20:10:35+00:00
3 5967 SBRJ medium_airport Santos Dumont Airport -22.9105 -43.163101 11 SA Brazil BR ... Rio de Janeiro 1 SBRJ SDU RJ0002 https://www4.infraero.gov.br/aeroportos/aeropo... https://en.wikipedia.org/wiki/Santos_Dumont_Ai... RIO 750 2022-03-28T23:27:00+00:00
4 5872 SBBR large_airport Presidente Juscelino Kubitschek International ... -15.869167 -47.920834 3497 SA Brazil BR ... Brasília 1 SBBR BSB DF0001 http://www.infraero.gov.br/usa/aero_prev_home.... https://en.wikipedia.org/wiki/Bras%C3%ADlia_In... NaN 51275 2020-08-24T11:15:12+00:00

5 rows × 23 columns

In [ ]:
# keep the  columns needed

infoairports.columns
Out[ ]:
Index(['id', 'ident', 'type', 'name', 'latitude_deg', 'longitude_deg',
       'elevation_ft', 'continent', 'country_name', 'iso_country',
       'region_name', 'iso_region', 'local_region', 'municipality',
       'scheduled_service', 'gps_code', 'iata_code', 'local_code', 'home_link',
       'wikipedia_link', 'keywords', 'score', 'last_updated'],
      dtype='object')
In [ ]:
keep=['name','type','latitude_deg', 'longitude_deg','elevation_ft','region_name','municipality']
infoairports=infoairports.loc[:,keep]

infoairports.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7085 entries, 0 to 7084
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   name           7085 non-null   object
 1   type           7085 non-null   object
 2   latitude_deg   7085 non-null   object
 3   longitude_deg  7085 non-null   object
 4   elevation_ft   6915 non-null   object
 5   region_name    7085 non-null   object
 6   municipality   7060 non-null   object
dtypes: object(7)
memory usage: 387.6+ KB

Some formatting:

In [ ]:
numericCols=['latitude_deg', 'longitude_deg','elevation_ft']
infoairports[numericCols]=infoairports.loc[:,numericCols].apply(lambda x:pd.to_numeric(x))
In [ ]:
# now
infoairports.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7085 entries, 0 to 7084
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   name           7085 non-null   object 
 1   type           7085 non-null   object 
 2   latitude_deg   7085 non-null   float64
 3   longitude_deg  7085 non-null   float64
 4   elevation_ft   6915 non-null   float64
 5   region_name    7085 non-null   object 
 6   municipality   7060 non-null   object 
dtypes: float64(3), object(4)
memory usage: 387.6+ KB
In [ ]:
# let's plot

base = brazil_5641.plot(color='white', edgecolor='black') #unprojected

infoairports.plot.scatter(x = 'longitude_deg', y = 'latitude_deg',ax=base)
Out[ ]:
<Axes: xlabel='longitude_deg', ylabel='latitude_deg'>
No description has been provided for this image

Why is it wrong?

In [ ]:
airports=gpd.GeoDataFrame(data=infoairports.copy(),
                 geometry=gpd.points_from_xy(infoairports.longitude_deg,
                                             infoairports.latitude_deg),
                 crs=brazil.crs.to_epsg())# the coordinates were in degrees - unprojected

# let's plot

base = brazil_5641.plot(color='white', edgecolor='black')
airports.plot(ax=base)
Out[ ]:
<Axes: >
No description has been provided for this image
In [ ]:
#remember:
type(airports), type(infoairports)
Out[ ]:
(geopandas.geodataframe.GeoDataFrame, pandas.core.frame.DataFrame)

Let's re project!

In [ ]:
airports_5641=airports.to_crs(5641)

## then

base = brazil_5641.plot(color='white', edgecolor='black')
airports_5641.plot(ax=base)
Out[ ]:
<Axes: >
No description has been provided for this image

Remember you have type of airports:

In [ ]:
airports_5641['type'].value_counts() # this will not work: airports.type.value_counts()
Out[ ]:
type
small_airport     4880
heliport          1787
closed             282
medium_airport     124
large_airport       11
seaplane_base        1
Name: count, dtype: int64

We may use that in the future. For now, just rename the type column to a different one.

In [ ]:
airports_5641.rename(columns={'type':'kind'},inplace=True)

## adding the airports to GPKG
airports_5641.to_file(os.path.join(maps,"brazilMaps_5641.gpkg"), layer='airports', driver="GPKG")

Exercise 3¶

  1. Find the airports for your country here. The data is in a CSV file.
  2. Create projected layer of airports.
  3. Plot all the layers and airports on top.

Formating Geoseries projections¶

You know brazil_5641 is a multipolygon:

In [ ]:
brazil_5641
Out[ ]:
COUNTRY geometry
29 Brazil MULTIPOLYGON (((1926257.542 8894978.397, 19262...

Sometime, you just need the border (lines):

In [ ]:
brazil_5641.boundary
Out[ ]:
29    MULTILINESTRING ((1926257.542 8894978.397, 192...
dtype: geometry
In [ ]:
# This is just the borderline
brazil_5641.boundary.plot()
Out[ ]:
<Axes: >
No description has been provided for this image

Always check the data type:

In [ ]:
# does 'boundary' return a GDF?
type(brazil_5641.boundary)
Out[ ]:
geopandas.geoseries.GeoSeries
def __init__(data=None, index=None, crs=None, **kwargs)
/usr/local/lib/python3.10/dist-packages/geopandas/geoseries.pyA Series object designed to store shapely geometry objects.

Parameters
----------
data : array-like, dict, scalar value
    The geometries to store in the GeoSeries.
index : array-like or Index
    The index for the GeoSeries.
crs : value (optional)
    Coordinate Reference System of the geometry objects. Can be anything accepted by
    :meth:`pyproj.CRS.from_user_input() <pyproj.crs.CRS.from_user_input>`,
    such as an authority string (eg "EPSG:4326") or a WKT string.

kwargs
    Additional arguments passed to the Series constructor,
     e.g. ``name``.

Examples
--------

>>> from shapely.geometry import Point
>>> s = geopandas.GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)])
>>> s
0    POINT (1.00000 1.00000)
1    POINT (2.00000 2.00000)
2    POINT (3.00000 3.00000)
dtype: geometry

>>> s = geopandas.GeoSeries(
...     [Point(1, 1), Point(2, 2), Point(3, 3)], crs="EPSG:3857"
... )
>>> s.crs  # doctest: +SKIP
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World - 85°S to 85°N
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

>>> s = geopandas.GeoSeries(
...    [Point(1, 1), Point(2, 2), Point(3, 3)], index=["a", "b", "c"], crs=4326
... )
>>> s
a    POINT (1.00000 1.00000)
b    POINT (2.00000 2.00000)
c    POINT (3.00000 3.00000)
dtype: geometry

>>> s.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

See Also
--------
GeoDataFrame
pandas.Series

Some operations in geopandas require GDF or GS. If you need a GDF instead of a GS:

In [ ]:
# converting into GDF
brazil_5641.boundary.to_frame()
Out[ ]:
0
29 MULTILINESTRING ((1926257.542 8894978.397, 192...

Notice you get a very simple GDF, and you may want to add some information:

In [ ]:
# conversion
brazil_border=brazil_5641.boundary.to_frame()

# new column (optional)
brazil_border['name']='Brazil'

# renaming the geometry column
brazil_border.rename(columns={0:'geometry'},inplace=True)

#setting the geometry (the name is not enough)
brazil_border = brazil_border.set_geometry("geometry")

# verifying:
brazil_border.crs
Out[ ]:
<Projected CRS: EPSG:5641>
Name: SIRGAS 2000 / Brazil Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: Brazil - offshore - equatorial margin.
- bounds: (-51.64, -5.74, -32.43, 7.04)
Coordinate Operation:
- name: Petrobras Mercator
- method: Mercator (variant B)
Datum: Sistema de Referencia Geocentrico para las AmericaS 2000
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

You can add this GDF as a layer.

Exercise 4¶

  1. Check if your country is a polygon or multipolygon.

  2. Recover just the boundaries of that country.

  3. Turn the boundary into a GDF.

Maps Lacking CRS information¶

Reprojecting seems a simple process, but you might find some interesting cases. Let's read the maps on states and municipalities:

In [ ]:
brazil_states=gpd.read_file(os.path.join(maps,"bra_adm_ibge_2020_shp","bra_admbnda_adm1_ibge_2020.shp"))
brazil_municipalities=gpd.read_file(os.path.join(maps,"bra_adm_ibge_2020_shp","bra_admbnda_adm2_ibge_2020.shp"))

They are maps, for sure:

In [ ]:
type(brazil_states), type(brazil_municipalities)
Out[ ]:
(geopandas.geodataframe.GeoDataFrame, geopandas.geodataframe.GeoDataFrame)
In [ ]:
brazil_states.geometry.head()
Out[ ]:
0    MULTIPOLYGON (((-68.87747 -11.01987, -68.88027...
1    POLYGON ((-35.46317 -8.82467, -35.46457 -8.828...
2    MULTIPOLYGON (((-50.46147 2.11133, -50.45627 2...
3    MULTIPOLYGON (((-58.49367 -0.84197, -58.48917 ...
4    MULTIPOLYGON (((-38.70687 -17.96447, -38.70867...
Name: geometry, dtype: geometry
In [ ]:
brazil_municipalities.geometry.head()
Out[ ]:
0    POLYGON ((-62.04950 -11.86731, -62.04559 -11.8...
1    POLYGON ((-62.42279 -9.80481, -62.42688 -9.806...
2    POLYGON ((-60.37329 -13.40887, -60.37323 -13.4...
3    POLYGON ((-61.00061 -10.99219, -61.00061 -11.0...
4    POLYGON ((-61.20642 -13.08759, -61.20282 -13.0...
Name: geometry, dtype: geometry

But, notice this:

In [ ]:
brazil_states.crs, brazil_municipalities.crs
Out[ ]:
(None, None)

They do not have crs information, however they can be plotted:

In [ ]:
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(ncols=2, sharex=False, sharey=False, figsize=(12,12))

brazil_states.plot(ax=ax1, facecolor='lightgrey', edgecolor='black')
brazil_municipalities.plot(ax=ax2, facecolor='lightgrey', edgecolor='black',linewidth=0.2)
Out[ ]:
<Axes: >
No description has been provided for this image

Since we are using the crs 5641 for Brazil, the initial strategy could be to set the CRS with the right projection :

In [ ]:
brazil_states.to_crs(5641)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-f2da4e029208> in <cell line: 1>()
----> 1 brazil_states.to_crs(5641)

/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py in to_crs(self, crs, epsg, inplace)
   1422         else:
   1423             df = self.copy()
-> 1424         geom = df.geometry.to_crs(crs=crs, epsg=epsg)
   1425         df.geometry = geom
   1426         if not inplace:

/usr/local/lib/python3.10/dist-packages/geopandas/geoseries.py in to_crs(self, crs, epsg)
   1132         """
   1133         return GeoSeries(
-> 1134             self.values.to_crs(crs=crs, epsg=epsg), index=self.index, name=self.name
   1135         )
   1136 

/usr/local/lib/python3.10/dist-packages/geopandas/array.py in to_crs(self, crs, epsg)
    790         """
    791         if self.crs is None:
--> 792             raise ValueError(
    793                 "Cannot transform naive geometries.  "
    794                 "Please set a crs on the object first."

ValueError: Cannot transform naive geometries.  Please set a crs on the object first.

Python says "Please set a crs on the object first". This would mean to know the actual projection, of the geometry:

From the plots above and the rows seen, we conclude the maps are unprojected; then:

In [ ]:
# set as unprojected
brazil_states.crs = "EPSG:4326"
brazil_municipalities.crs = "EPSG:4326"

Now, we can reproject:

In [ ]:
brazil_states=brazil_states.to_crs(5641)
brazil_municipalities=brazil_municipalities.to_crs(5641)

Exercise 5 (conditional)¶

  1. Look for sub administrative divisions of your country
In [ ]:
# https://data.humdata.org/dataset/whosonfirst-data-admin-dnk

denmark_states=gpd.read_file(os.path.join(maps,"whosonfirst-data-admin-dk-latest","whosonfirst-data-admin-dk-region-polygon.shp"))

denmark_municipalities=gpd.read_file(os.path.join(maps,"whosonfirst-data-admin-dk-latest","whosonfirst-data-admin-dk-localadmin-polygon.shp"))
In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [ ]:
fig, (ax1, ax2) = plt.subplots(ncols=2, sharex=False, sharey=False, figsize=(12,12))

denmark_states.cx[0:,:].plot(ax=ax1, facecolor='lightgrey', edgecolor='black')
denmark_municipalities.plot(ax=ax2, facecolor='lightgrey', edgecolor='black',linewidth=0.2)
Out[ ]:
<Axes: >
No description has been provided for this image
  1. Check all the CRSs of those divisions
In [ ]:
print(type(denmark_states), type(denmark_municipalities)) #vemos que es un gdf
print(denmark_states.geometry.head()) # es multipoligonal
print(denmark_states.crs, denmark_municipalities.crs) # sí tiene crs
<class 'geopandas.geodataframe.GeoDataFrame'> <class 'geopandas.geodataframe.GeoDataFrame'>
0    MULTIPOLYGON (((-6.91145 61.62627, -6.89056 61...
1    MULTIPOLYGON (((8.40776 55.45601, 8.40627 55.4...
2    MULTIPOLYGON (((14.70088 55.14588, 14.69875 55...
3    MULTIPOLYGON (((10.98130 54.83016, 10.97817 54...
4    MULTIPOLYGON (((11.02050 57.24190, 11.06478 57...
Name: geometry, dtype: geometry
EPSG:4326 EPSG:4326
In [ ]:
print(denmark_states.crs.axis_info, '\n', denmark_municipalities.crs.axis_info) # ESTÁ EN POLARES, DEBEMOS TRANSFORMAR LUEGO A METROS
print(denmark_states.crs, denmark_municipalities.crs) # sí tiene crs
[Axis(name=Geodetic latitude, abbrev=Lat, direction=north, unit_auth_code=EPSG, unit_code=9122, unit_name=degree), Axis(name=Geodetic longitude, abbrev=Lon, direction=east, unit_auth_code=EPSG, unit_code=9122, unit_name=degree)] 
 [Axis(name=Geodetic latitude, abbrev=Lat, direction=north, unit_auth_code=EPSG, unit_code=9122, unit_name=degree), Axis(name=Geodetic longitude, abbrev=Lon, direction=east, unit_auth_code=EPSG, unit_code=9122, unit_name=degree)]
EPSG:4326 EPSG:4326
In [ ]:
denmark = countries[countries.COUNTRY=='Dinamarca']
denmark.crs #4326
Out[ ]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
  1. If you find one CRS is missing, fill the CRS with the right projection.
In [ ]:
# ya tienen su CRS, lo que nos falta es hacer la correcta proyección a un crs favorable

# tomamos "FEH2010 / Fehmarnbelt TM" con "EPSG:5596"

denmark_states = denmark_states.to_crs(5596)
denmark_municipalities = denmark_municipalities.to_crs(5596)

TOC _____________

Combining GeoDF rows¶

Let me divide Brazil municipalities into a higher administrative level:

In [ ]:
brazil_municipalities.plot(facecolor='lightgrey', edgecolor='black',linewidth=0.2)
Out[ ]:
<Axes: >
No description has been provided for this image
In [ ]:
#see
brazil_municipalities.head()
Out[ ]:
ADM0_EN ADM0_PT ADM0_PCODE ADM1_PT ADM1_PCODE ADM2_PT ADM2_PCODE ET_ID geometry
0 Brazil Brasil BR Rondônia BR11 Alta Floresta D'Oeste BR1100015 0 POLYGON ((2880702.575 8678970.180, 2881137.154...
1 Brazil Brasil BR Rondônia BR11 Ariquemes BR1100023 1 POLYGON ((2839173.154 8911097.984, 2838718.204...
2 Brazil Brasil BR Rondônia BR11 Cabixi BR1100031 2 POLYGON ((3067184.343 8504324.153, 3067191.133...
3 Brazil Brasil BR Rondônia BR11 Cacoal BR1100049 3 POLYGON ((2997393.730 8777661.276, 2997393.730...
4 Brazil Brasil BR Rondônia BR11 Cerejeiras BR1100056 4 POLYGON ((2974496.868 8540812.592, 2974897.495...
In [ ]:
# higher level
brazil_municipalities.ADM1_PT.value_counts()
Out[ ]:
ADM1_PT
Minas Gerais           853
São Paulo              645
Rio Grande do Sul      499
Bahia                  417
Paraná                 399
Santa Catarina         295
Goiás                  246
Piauí                  224
Paraíba                223
Maranhão               217
Pernambuco             185
Ceará                  184
Rio Grande do Norte    167
Pará                   144
Mato Grosso            141
Tocantins              139
Alagoas                102
Rio de Janeiro          92
Mato Grosso do Sul      79
Espírito Santo          78
Sergipe                 75
Amazonas                62
Rondônia                52
Acre                    22
Amapá                   16
Roraima                 15
Distrito Federal         1
Name: count, dtype: int64

Then, this is Minas Gerais:

In [ ]:
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].plot()
Out[ ]:
<Axes: >
No description has been provided for this image

Unary UNION¶

We can combine all these polygons into one:

In [ ]:
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].unary_union
Out[ ]:
No description has been provided for this image

Let's save that result:

In [ ]:
MinasGerais_union=brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].unary_union
In [ ]:
# what do we have?
type(MinasGerais_union)
Out[ ]:
shapely.geometry.multipolygon.MultiPolygon

You can turn that shapely object into a GeoDF like this:

In [ ]:
gpd.GeoDataFrame(index=[0],data={'ADM':'Minas Gerais'},
                 crs=brazil_municipalities.crs,
                 geometry=[MinasGerais_union])
Out[ ]:
ADM geometry
0 Minas Gerais MULTIPOLYGON (((4613768.235 7444013.818, 46132...

Dissolve¶

a. Dissolve as Union¶

Using dissolve is an alternative to UNION:

In [ ]:
brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].dissolve().plot()
Out[ ]:
<Axes: >
No description has been provided for this image

Let me save the result, and see the type :

In [ ]:
MinasGerais_dissolve=brazil_municipalities[brazil_municipalities.ADM1_PT=='Minas Gerais'].dissolve()

# we got?
type(MinasGerais_dissolve)
Out[ ]:
geopandas.geodataframe.GeoDataFrame
def __init__(data=None, *args, geometry=None, crs=None, **kwargs)
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.pyA GeoDataFrame object is a pandas.DataFrame that has a column
with geometry. In addition to the standard DataFrame constructor arguments,
GeoDataFrame also accepts the following keyword arguments:

Parameters
----------
crs : value (optional)
    Coordinate Reference System of the geometry objects. Can be anything accepted by
    :meth:`pyproj.CRS.from_user_input() <pyproj.crs.CRS.from_user_input>`,
    such as an authority string (eg "EPSG:4326") or a WKT string.
geometry : str or array (optional)
    If str, column to use as geometry. If array, will be set as 'geometry'
    column on GeoDataFrame.

Examples
--------
Constructing GeoDataFrame from a dictionary.

>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf
    col1                 geometry
0  name1  POINT (1.00000 2.00000)
1  name2  POINT (2.00000 1.00000)

Notice that the inferred dtype of 'geometry' columns is geometry.

>>> gdf.dtypes
col1          object
geometry    geometry
dtype: object

Constructing GeoDataFrame from a pandas DataFrame with a column of WKT geometries:

>>> import pandas as pd
>>> d = {'col1': ['name1', 'name2'], 'wkt': ['POINT (1 2)', 'POINT (2 1)']}
>>> df = pd.DataFrame(d)
>>> gs = geopandas.GeoSeries.from_wkt(df['wkt'])
>>> gdf = geopandas.GeoDataFrame(df, geometry=gs, crs="EPSG:4326")
>>> gdf
    col1          wkt                 geometry
0  name1  POINT (1 2)  POINT (1.00000 2.00000)
1  name2  POINT (2 1)  POINT (2.00000 1.00000)

See also
--------
GeoSeries : Series object designed to store shapely geometry objects

You got a GEOdf this time:

In [ ]:
## see
MinasGerais_dissolve
Out[ ]:
geometry ADM0_EN ADM0_PT ADM0_PCODE ADM1_PT ADM1_PCODE ADM2_PT ADM2_PCODE ET_ID
0 MULTIPOLYGON (((4613768.235 7444013.818, 46132... Brazil Brasil BR Minas Gerais BR31 Abadia dos Dourados BR3100104 2244
In [ ]:
# keeping what is relevant
MinasGerais_dissolve.drop(columns=['ADM2_PT','ADM2_PCODE','ET_ID'],inplace=True)

# then
MinasGerais_dissolve
Out[ ]:
geometry ADM0_EN ADM0_PT ADM0_PCODE ADM1_PT ADM1_PCODE
0 MULTIPOLYGON (((4613768.235 7444013.818, 46132... Brazil Brasil BR Minas Gerais BR31

b. Dissolve for groups¶

Using dissolve() with no arguments returns the union of the polygons, BUT also you get a GEOdf. However, if you have a column that represents a grouping (as we do), you can dissolve by that column:

In [ ]:
# dissolving
brazil_municipalities.dissolve(by='ADM1_PT').plot(facecolor='yellow', edgecolor='black',linewidth=0.2)
Out[ ]:
<Axes: >
No description has been provided for this image

Again, let me save this result:

In [ ]:
Brazil_adm1_diss=brazil_municipalities.dissolve(by='ADM1_PT')

We know we have a GeoDF; let's see contents:

In [ ]:
Brazil_adm1_diss
Out[ ]:
geometry ADM0_EN ADM0_PT ADM0_PCODE ADM1_PCODE ADM2_PT ADM2_PCODE ET_ID
ADM1_PT
Acre MULTIPOLYGON (((2060163.783 8784945.389, 20600... Brazil Brasil BR BR12 Acrelândia BR1200013 52
Alagoas POLYGON ((5672327.106 8882538.373, 5672062.284... Brazil Brasil BR BR27 Água Branca BR2700102 1650
Amapá MULTIPOLYGON (((4019970.847 9872122.620, 40190... Brazil Brasil BR BR16 Serra do Navio BR1600055 295
Amazonas MULTIPOLYGON (((1990420.701 9130001.855, 19900... Brazil Brasil BR BR13 Alvarães BR1300029 74
Bahia MULTIPOLYGON (((4662529.316 8340947.682, 46619... Brazil Brasil BR BR29 Abaíra BR2900108 1827
Ceará MULTIPOLYGON (((5251532.758 9270745.656, 52514... Brazil Brasil BR BR23 Abaiara BR2300101 891
Distrito Federal POLYGON ((4513061.426 8221825.541, 4512959.571... Brazil Brasil BR BR53 Brasília BR5300108 5571
Espírito Santo MULTIPOLYGON (((5230048.277 7601262.124, 52298... Brazil Brasil BR BR32 Afonso Cláudio BR3200102 3097
Goiás MULTIPOLYGON (((4095906.674 7824766.730, 40957... Brazil Brasil BR BR52 Abadia de Goiás BR5200050 5325
Maranhão MULTIPOLYGON (((4672225.852 9020212.253, 46724... Brazil Brasil BR BR21 Açailândia BR2100055 450
Mato Grosso MULTIPOLYGON (((3596657.302 8050261.396, 35962... Brazil Brasil BR BR51 Acorizal BR5100102 5184
Mato Grosso do Sul MULTIPOLYGON (((3578221.661 7460077.796, 35777... Brazil Brasil BR BR50 Água Clara BR5000203 5105
Minas Gerais MULTIPOLYGON (((4613768.235 7444013.818, 46132... Brazil Brasil BR BR31 Abadia dos Dourados BR3100104 2244
Paraná MULTIPOLYGON (((3838507.104 6986653.341, 38381... Brazil Brasil BR BR41 Abatiá BR4100103 3912
Paraíba MULTIPOLYGON (((5537757.114 9137226.114, 55371... Brazil Brasil BR BR25 Água Branca BR2500106 1242
Pará MULTIPOLYGON (((4029416.142 8917861.350, 39986... Brazil Brasil BR BR15 Abaetetuba BR1500107 151
Pernambuco MULTIPOLYGON (((5543087.493 8978786.229, 55426... Brazil Brasil BR BR26 Abreu e Lima BR2600054 1465
Piauí POLYGON ((4788787.991 8787535.671, 4788523.170... Brazil Brasil BR BR22 Acauã BR2200053 667
Rio Grande do Norte POLYGON ((5526933.390 9281317.002, 5526247.570... Brazil Brasil BR BR24 Acari BR2400109 1075
Rio Grande do Sul MULTIPOLYGON (((3696834.468 6334740.927, 36961... Brazil Brasil BR BR43 Lagoa Mirim BR4300001 4606
Rio de Janeiro MULTIPOLYGON (((4818916.513 7344536.670, 48187... Brazil Brasil BR BR33 Angra dos Reis BR3300100 3175
Rondônia MULTIPOLYGON (((2786460.123 8567095.355, 27860... Brazil Brasil BR BR11 Alta Floresta D'Oeste BR1100015 0
Roraima POLYGON ((3104062.415 10027970.921, 3103505.61... Brazil Brasil BR BR14 Amajari BR1400027 136
Santa Catarina MULTIPOLYGON (((3902797.582 6876520.458, 39024... Brazil Brasil BR BR42 Abdon Batista BR4200051 4311
Sergipe POLYGON ((5596445.601 8718324.023, 5596255.473... Brazil Brasil BR BR28 Amparo do São Francisco BR2800100 1752
São Paulo MULTIPOLYGON (((4110594.074 7411208.575, 41102... Brazil Brasil BR BR35 Adamantina BR3500105 3267
Tocantins POLYGON ((4304809.984 8561426.853, 4305040.854... Brazil Brasil BR BR17 Abreulândia BR1700251 311

Again, we can drop columns that do not bring important information:

In [ ]:
Brazil_adm1_diss.drop(columns=['ADM2_PT','ADM2_PCODE','ET_ID'],inplace=True)
Brazil_adm1_diss.reset_index(inplace=True)
Brazil_adm1_diss.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 27 entries, 0 to 26
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   ADM1_PT     27 non-null     object  
 1   geometry    27 non-null     geometry
 2   ADM0_EN     27 non-null     object  
 3   ADM0_PT     27 non-null     object  
 4   ADM0_PCODE  27 non-null     object  
 5   ADM1_PCODE  27 non-null     object  
dtypes: geometry(1), object(5)
memory usage: 1.4+ KB

Exercise 6¶

  1. Create some subset of polygons with your country data.
  2. Use Unary UNION with those polygons.
  3. Create a geoDF with the result.
  4. Use dissolve with the same polygons, and create a geoDF.
  1. Create some subset of polygons with your country data.
In [ ]:
denmark_municipalities.plot(ax=ax2, facecolor='lightgrey', edgecolor='black',linewidth=0.2)
Out[ ]:
<Axes: >
<Figure size 640x480 with 0 Axes>
In [ ]:
pd.set_option('display.max_columns', None)
denmark_municipalities.head(10)
Out[ ]:
id parent_id name placetype placelocal country repo lat lon min_lat min_lon max_lat max_lon min_zoom max_zoom min_label max_label modified is_funky population country_id region_id county_id concord_ke concord_id iso_code hasc_id gn_id wd_id name_ara name_ben name_deu name_eng name_ell name_fas name_fra name_heb name_hin name_hun name_ind name_ita name_jpn name_kor name_nld name_pol name_por name_rus name_spa name_swe name_tur name_ukr name_urd name_vie name_zho geom_src geometry
0 1159297717 85682589 Holbaek localadmin kommune DK whosonfirst-data-admin-dk 55.675774 11.573848 55.508929 11.350962 55.809169 11.858369 12 None 13.5 18 2023-09-28 None 71541 85633121 85682589 None dk-geodk:code 316 None None None None None None None Holbaek None None None None None None None None None None None None None None None None None None None None None dk-geodk MULTIPOLYGON (((1028605.296 6171633.116, 10286...
1 1159297719 85682589 Stevns localadmin kommune DK whosonfirst-data-admin-dk 55.326944 12.334101 55.235697 12.107425 55.427477 12.456528 12 None 13.5 18 2023-09-28 None 22805 85633121 85682589 None dk-geodk:code 336 None None None None None None None Stevns None None None None None None None None None None None None None None None None None None None None None dk-geodk MULTIPOLYGON (((1071296.650 6133778.538, 10712...
2 1159297721 85682597 Skanderborg localadmin kommune DK whosonfirst-data-admin-dk 56.076149 9.872121 55.953987 9.633145 56.216936 10.092933 12 None 13.5 18 2023-09-28 None 62678 85633121 85682597 None dk-geodk:code 746 None None None None None None Skanderborg Skanderborg None None Skanderborg None None None None Skanderborg None None Skanderborg Skanderborg Skanderborg Сканнерборг Skanderborg Skanderborg None None None None 斯坎訥堡自治市 dk-geodk POLYGON ((894560.919 6217046.948, 894711.815 6...
3 1159297723 85682581 Lyngby-Taarbaek localadmin kommune DK whosonfirst-data-admin-dk 55.781735 12.510815 55.759558 12.413286 55.809710 12.596912 12 None 13.5 18 2023-09-28 None 56214 85633121 85682581 None dk-geodk:code 173 None None None None None None None Lyngby-Taarbaek None None None None None None None None None None None None None None None None None None None None None dk-geodk POLYGON ((1078317.333 6187878.034, 1078327.597...
4 1159297725 85682597 Herning localadmin kommune DK whosonfirst-data-admin-dk 56.203869 8.933449 55.879568 8.433759 56.414040 9.187847 12 None 13.5 18 2023-09-28 None 89127 85633121 85682597 None dk-geodk:code 657 None None None None هرنينغ None Herning Herning None هرنینگ Herning None None None Munisipalitas Herning Herning ヘアニング 헤르닝 Herning Herning None Хернинг Herning Herning Herning Гернінг None Herning 海宁 dk-geodk MULTIPOLYGON (((832268.655 6226344.658, 832274...
5 1159297727 85682581 Copenhagen localadmin kommune DK whosonfirst-data-admin-dk 55.698354 12.571489 55.612853 12.452998 55.732711 12.734255 12 None 13.5 18 2023-09-28 None 632340 85633121 85682581 None dk-geodk:code 101 None None None None كوبنهاغن কোপেনহেগেন Kopenhagen Copenhagen Κοπεγχάγη کپنهاگ Copenhague קופנהגן कोपनहेगन Koppenhága Kopenhagen Copenaghen コペンハーゲン 코펜하겐 Kopenhagen Kopenhaga Copenhaga Копенгаген Copenhague Köpenhamn Kopenhag Копенгаген کوپن ہیگن Copenhagen 哥本哈根 dk-geodk MULTIPOLYGON (((1088066.280 6176427.948, 10880...
6 1159297729 85682581 Frederikssund localadmin kommune DK whosonfirst-data-admin-dk 55.793708 11.985618 55.714237 11.845962 55.937969 12.231291 12 None 13.5 18 2023-09-28 None 45223 85633121 85682581 None dk-geodk:code 250 None None None None None None Frederikssund Frederikssund None None None None None Frederikssund None None None None Frederikssund Frederikssund None Фредерикссунн Frederikssund Frederikssund None None None None None dk-geodk MULTIPOLYGON (((1043096.801 6177341.505, 10430...
7 1159297733 85682581 Albertslund localadmin kommune DK whosonfirst-data-admin-dk 55.679988 12.347031 55.641959 12.311000 55.709789 12.396280 12 None 13.5 18 2023-09-28 None 27731 85633121 85682581 None dk-geodk:code 165 None None None None None None None Albertslund None None None None None None None None None None None Albertslund None None Albertslund Albertslund None None None None None dk-geodk POLYGON ((1065542.084 6171219.759, 1065519.958...
8 1159297735 85682597 Samso localadmin kommune DK whosonfirst-data-admin-dk 55.818602 10.581390 55.764339 10.520158 56.002364 10.793492 12 None 13.5 18 2023-09-28 None 3657 85633121 85682597 None dk-geodk:code 741 None None None None None None None Samso None None None None None None None None None None None None None None None None None None None None None dk-geodk MULTIPOLYGON (((956826.597 6195325.368, 956823...
9 1159297737 85682589 Guldborgsund localadmin kommune DK whosonfirst-data-admin-dk 54.814822 11.950360 54.559078 11.530032 54.971843 12.165719 12 None 13.5 18 2023-09-28 None 60722 85633121 85682589 None dk-geodk:code 376 None None None None None None Guldborgsund Guldborgsund None None Guldborgsund None None None None None グルドボースンド海峡 None Guldborg Sund None None None Guldborgsund Guldborg Sund None None None None 古爾德堡海峽 dk-geodk MULTIPOLYGON (((1027998.825 6079346.001, 10280...
In [ ]:
denmark_municipalities['parent_id'].value_counts() # a qué región pertenece cada localidad
Out[ ]:
parent_id
 85682581    29
 85682575    22
 85682597    19
 85682589    17
 85682593    11
-1            1
Name: count, dtype: int64
In [ ]:
denmark_municipalities[denmark_municipalities['parent_id']==85682581].plot()
No description has been provided for this image
  1. Use Unary UNION with those polygons.
In [ ]:
parent_id_union=denmark_municipalities[denmark_municipalities['parent_id']==85682581].unary_union
parent_id_union
Out[ ]:
No description has been provided for this image
  1. Create a geoDF with the result.
In [ ]:
parentGDF = gpd.GeoDataFrame(index=[0],data={'parent_id':85682581},
                 crs=denmark_municipalities.crs,
                 geometry=[parent_id_union])
parentGDF
Out[ ]:
parent_id geometry
0 85682581 MULTIPOLYGON (((1042974.632 6177385.131, 10429...
  1. Use dissolve with the same polygons, and create a geoDF.
In [ ]:
denmark_municipalities.dissolve(by='parent_id').plot(facecolor='yellow', edgecolor='black',linewidth=0.2)

denmark_parentid_diss=denmark_municipalities.dissolve(by='parent_id')
denmark_parentid_diss #este es nuestro geo dataframe
Out[ ]:
geometry id name placetype placelocal country repo lat lon min_lat min_lon max_lat max_lon min_zoom max_zoom min_label max_label modified is_funky population country_id region_id county_id concord_ke concord_id iso_code hasc_id gn_id wd_id name_ara name_ben name_deu name_eng name_ell name_fas name_fra name_heb name_hin name_hun name_ind name_ita name_jpn name_kor name_nld name_pol name_por name_rus name_spa name_swe name_tur name_ukr name_urd name_vie name_zho geom_src
parent_id
-1 MULTIPOLYGON (((1244246.738 6139688.235, 12442... 1394014155 Christianso localadmin kommune DK whosonfirst-data-admin-dk 55.320402 15.188798 55.317257 15.173827 55.330772 15.197400 12 None 13.5 18 2023-09-28 None 84 -1 -1 None dk-geodk:code 411 None None None None None None None Christianso None None None None None None None None None None None None None None None None None None None None None dk-geodk
85682575 MULTIPOLYGON (((882439.770 6083032.984, 882441... 1159297747 Haderslev localadmin kommune DK whosonfirst-data-admin-dk 55.232202 9.374950 55.125549 8.884642 55.382343 9.778885 12 None 13.5 18 2023-09-28 None 55670 85633121 85682575 None dk-geodk:code 510 None None None None أودنسه সভেন্ডবোর্গ Haderslev Haderslev Λάνγκελαντ ادنسه Kerteminde אודנסה ओडिन्से Vejen Langeland Haderslev ランゲラン島 하데르슬레우 Haderslev Haderslev Haderslev Хадерслев Haderslev Haderslev Haderslev Гадерслев اودنسے Langeland 凯特明讷 dk-geodk
85682581 MULTIPOLYGON (((1042974.632 6177385.131, 10429... 1159297723 Lyngby-Taarbaek localadmin kommune DK whosonfirst-data-admin-dk 55.781735 12.510815 55.759558 12.413286 55.809710 12.596912 12 None 13.5 18 2023-09-28 None 56214 85633121 85682581 None dk-geodk:code 173 None None None None كوبنهاغن কোপেনহেগেন Kopenhagen Lyngby-Taarbaek Κοπεγχάγη کپنهاگ Copenhague קופנהגן कोपनहेगन Koppenhága Kopenhagen Copenaghen コペンハーゲン 코펜하겐 Kopenhagen Kopenhaga Copenhaga Копенгаген Copenhague Köpenhamn Kopenhag Копенгаген کوپن ہیگن Copenhagen 哥本哈根 dk-geodk
85682589 MULTIPOLYGON (((976998.632 6074588.187, 977002... 1159297717 Holbaek localadmin kommune DK whosonfirst-data-admin-dk 55.675774 11.573848 55.508929 11.350962 55.809169 11.858369 12 None 13.5 18 2023-09-28 None 71541 85633121 85682589 None dk-geodk:code 316 None None None None لولاند রসক্লাইড Guldborgsund Holbaek Λόλλαντ لولند Guldborgsund ליירה रोसकिल्ड Slagelse Lolland Slagelse グルドボースンド海峡 슬라겔세 Guldborg Sund Slagelse Slagelse Слагельсе Guldborgsund Guldborg Sund Roskilde Слагельсе راسکیلے Slagelse 古爾德堡海峽 dk-geodk
85682593 MULTIPOLYGON (((834510.776 6287190.389, 834508... 1159297741 Vesthimmerland localadmin kommune DK whosonfirst-data-admin-dk 56.813064 9.364557 56.636899 9.068354 57.030598 9.659873 12 None 13.5 18 2023-09-28 None 36727 85633121 85682593 None dk-geodk:code 820 None None None None آلبورغ ফ্রেডেরিকশাভন Aalborg Vesthimmerland Άλμποργκ آلبورگ Aalborg אולבורג फ़्रेडेरिकशॉन Aalborg Aalborg Aalborg オールボー 올보르 Aalborg Aalborg Aalborg Ольборг Aalborg Ålborg Aalborg Ольборг آلبو Aalborg 奥尔堡 dk-geodk
85682597 MULTIPOLYGON (((802338.295 6194475.607, 802338... 1159297721 Skanderborg localadmin kommune DK whosonfirst-data-admin-dk 56.076149 9.872121 55.953987 9.633145 56.216936 10.092933 12 None 13.5 18 2023-09-28 None 62678 85633121 85682597 None dk-geodk:code 746 None None None None هرنينغ ভিবর্গ Skanderborg Skanderborg Ράντερς هرنینگ Skanderborg אורהוס वाइबोर्ग Randers Munisipalitas Herning Skanderborg ヘアニング 헤르닝 Skanderborg Skanderborg Skanderborg Сканнерборг Skanderborg Skanderborg Herning Гернінг آرہس Herning 斯坎訥堡自治市 dk-geodk
No description has been provided for this image

c. Dissolve and aggregate¶

In pandas, you have can aggregate data using some statistics. Dissolve can be used in the same way. Let me use the indicators layer:

In [ ]:
indicators.head()

We have numerical columns, and a grouping column named region. Let's get some averages by region, while combining the polygons:

In [ ]:
indicators.dissolve(
     by="region",
     aggfunc={
         "COUNTRY": "count",
         "fragility": ["mean"],
         "co2": ["mean"],
         "ForestRev_gdp": ["mean"]
     },as_index=False,
 )

Let me save and plot:

In [ ]:
indicatorsByRegion=indicators.dissolve(
     by="region",
     aggfunc={
         "COUNTRY": "count",
         "fragility": ["mean"],
         "co2": ["mean"],
         "ForestRev_gdp": ["mean"]
     },as_index=False,
 )


indicatorsByRegion.plot(column = 'region')

Without renaming, you can request a choropleth:

In [ ]:
indicatorsByRegion.plot(column =('fragility', 'mean'),scheme='quantiles', cmap='OrRd',
                        legend=True,
                        legend_kwds={"title": "Fragility",'loc': 'lower left'},
                        edgecolor='black',linewidth=0.2,
                        figsize=(15, 10))

TOC _____________

The convex hull¶

Some time you may have the need to create a polygon that serves as an envelope to a set of points.

For this example, let me compute the centroid coordinates:

In [ ]:
brazil_5641.centroid
In [ ]:
# then
brazil_5641.centroid.x.values[0],brazil_5641.centroid.y.values[0]
In [ ]:
airports_5641
In [ ]:
# coordinates
centroidX,centroidY=brazil_5641.centroid.x.values[0],brazil_5641.centroid.y.values[0]

# subsets of medium airports
Brazil_AirTopLeft=airports_5641[airports_5641.kind=='medium_airport'].cx[:centroidX,centroidY:]
Brazil_AirTopRight=airports_5641[airports_5641.kind=='medium_airport'].cx[centroidX:,centroidY:]
Brazil_AirBottomLeft=airports_5641[airports_5641.kind=='medium_airport'].cx[:centroidX,:centroidY]
Brazil_AirBottomRight=airports_5641[airports_5641.kind=='medium_airport'].cx[centroidX:,:centroidY]

Let me plot the subsets:

In [ ]:
base=Brazil_AirTopLeft.plot(facecolor='grey', alpha=0.4)
Brazil_AirTopRight.plot(ax=base,facecolor='orange', alpha=0.4)
Brazil_AirBottomLeft.plot(ax=base,facecolor='green', alpha=0.4)
Brazil_AirBottomRight.plot(ax=base,facecolor='red', alpha=0.4)

Notice we have simple points in each subset:

In [ ]:
Brazil_AirBottomLeft

In this situation, you can not make a convex hull:

In [ ]:
Brazil_AirBottomLeft.convex_hull.plot()

You first need to dissolve, and then you create a hull, an envelope of convex angles:

In [ ]:
Brazil_AirBottomLeft.dissolve().convex_hull.plot()

As we saw, the convex hull is a polygon:

In [ ]:
Brazil_AirBottomLeft.dissolve().convex_hull

What if we the polygons had not been previously combined?

In [ ]:
brazil_municipalities.cx[:centroidX,:centroidY].convex_hull.plot(edgecolor='red')

That is, you get a convex hull for each geometry.

We can also use union before creating a convex hull:

In [ ]:
# just the union
large_airport=airports_5641[airports_5641.kind=='large_airport']
large_airport.unary_union
In [ ]:
# hull of the union
large_airport.unary_union.convex_hull

Let's turn the GS into a GDF:

In [ ]:
LargeAirport_hull= gpd.GeoDataFrame(index=[0],
                                    crs=large_airport.crs,
                                    geometry=[large_airport.unary_union.convex_hull])
LargeAirport_hull['name']='large airports hull' # optional

# then

LargeAirport_hull

Let's use the GDF in plotting:

In [ ]:
base=brazil_5641.plot(facecolor='yellow')
large_airport.plot(ax=base)
LargeAirport_hull.plot(ax=base,facecolor='green',
                       edgecolor='white',alpha=0.4,
                       hatch='X')

Exercise 7¶

  1. Select some points from your maps.

  2. Create the convex hull for those points.

  3. Turn the hull into a GDF.

  4. Plot the hull on top of the country.

TOC _____________

Spatial Overlay¶

We might need to create or find some geometries from the geometries we already have. Using a set theory approach, we will se the use of intersection, union, difference, and symmetric difference.

Let me create this GeoDFs:

In [ ]:
# the north
MunisN_brazil=brazil_municipalities.cx[:,centroidY:]
# the south
MunisS_brazil=brazil_municipalities.cx[:,:centroidY]
# the west
MunisW_brazil=brazil_municipalities.cx[:centroidX,:]
# the east
MunisE_brazil=brazil_municipalities.cx[centroidX:,:]

Let me plot:

In [ ]:
base=MunisN_brazil.plot(facecolor='yellow', edgecolor='black',linewidth=0.2, alpha=0.6)
MunisS_brazil.plot(facecolor='grey', edgecolor='black',linewidth=0.2,ax=base, alpha=0.4)
In [ ]:
base=MunisE_brazil.plot(facecolor='yellow', edgecolor='black',linewidth=0.2, alpha=0.6)
MunisW_brazil.plot(facecolor='grey', edgecolor='black',linewidth=0.2,ax=base, alpha=0.4)

Intersection¶

We keep what is common between GeoDFs:

In [ ]:
munisNS_brazil=MunisN_brazil.overlay(MunisS_brazil, how="intersection",keep_geom_type=True)
munisNS_brazil.plot()

This is similar to a spatial join:

In [ ]:
MunisN_brazil.sjoin(MunisS_brazil, how="inner", predicate='contains').plot()
In [ ]:
# keeping the overlay
munisWE_brazil=MunisW_brazil.overlay(MunisE_brazil, how="intersection",keep_geom_type=True)
munisWE_brazil.plot(edgecolor='white',linewidth=0.1)

Union¶

Let me unite the previous two GeoDFs. First, take a look at each one:

In [ ]:
munisNS_brazil.info()
In [ ]:
munisWE_brazil.info()

The overlay combines the geometries, but not the attributes. Let me subset and show you:

In [ ]:
keep=['ADM0_EN_1','ADM1_PT_1','ADM2_PT_1','geometry']
munisNS_brazil=munisNS_brazil.loc[:,keep]
munisWE_brazil=munisWE_brazil.loc[:,keep]
In [ ]:
# now
munisNS_brazil.overlay(munisWE_brazil,how="union",keep_geom_type=True)

As you see, geometries are fine, but not attributes. It is strictly NOT appending the GeoDFs:

In [ ]:
# appending
pd.concat([munisNS_brazil,munisWE_brazil],ignore_index=True)

You will append if you are interested in the keeping the attributes. But you just do the overlay if you are planing to combine final results:

In [ ]:
munisNS_brazil.dissolve().overlay(munisWE_brazil.dissolve(), how="union",keep_geom_type=True).dissolve().plot()

Let me create an object to save the previous result:

In [ ]:
muniMidBrazil=munisNS_brazil.dissolve().overlay(munisWE_brazil.dissolve(), how="union",keep_geom_type=True).dissolve()
muniMidBrazil
In [ ]:
# some cleaning

muniMidBrazil['zone']='middles'
muniMidBrazil=muniMidBrazil.loc[:,['ADM0_EN_1_1','zone','geometry']]
muniMidBrazil

Difference¶

Here, you keep what belongs to the GeoDF to left that is not in the GeoDF to the right:

In [ ]:
# with the municipalities
brazil_municipalities.overlay(muniMidBrazil, how='difference').plot()

Symmetric Difference¶

This is the opposite to intersection, you keep what is not in the intersection:

In [ ]:
MunisN_brazil.overlay(MunisS_brazil, how="symmetric_difference",keep_geom_type=False).plot()
In [ ]:
MunisW_brazil.overlay(MunisE_brazil, how="symmetric_difference",keep_geom_type=False).plot()

Exercise 8¶

  1. Apply two of these operations to your maps.
  2. Apply two of these operations to the next maps:
In [ ]:
# hulls for the mid size airports:
Brazil_AirTopLeft_hull=Brazil_AirTopLeft.dissolve().convex_hull
Brazil_AirTopRight_hull=Brazil_AirTopRight.dissolve().convex_hull
Brazil_AirBottomLeft_hull=Brazil_AirBottomLeft.dissolve().convex_hull
Brazil_AirBottomRight_hull=Brazil_AirBottomRight.dissolve().convex_hull
In [ ]:
base = brazil_5641.plot(color='white', edgecolor='black') #unprojected
muniMidBrazil.plot(ax=base,facecolor='magenta',alpha=0.4) #unprojected
LargeAirport_hull.plot(ax=base,facecolor='purple',alpha=0.4)
Brazil_AirTopLeft_hull.plot(ax=base,facecolor='grey', alpha=0.4)
Brazil_AirTopRight_hull.plot(ax=base,facecolor='orange', alpha=0.4)
Brazil_AirBottomLeft_hull.plot(ax=base,facecolor='green', alpha=0.4)
Brazil_AirBottomRight_hull.plot(ax=base,facecolor='red', alpha=0.4)

TOC _____________

Validity of Geometry¶

Geometries are created in a way that some issues may appear, especially in (multi) polygons. Let's check if our recent maps on states and municipalities are valid:

In [ ]:
# non valid
brazil_municipalities[~brazil_municipalities.is_valid]
In [ ]:
# see the invalid:
brazil_municipalities[~brazil_municipalities.is_valid].plot()

It is difficult to see what is wrong. Let's get some information:

In [ ]:
# what is wrong?

from shapely.validation import explain_validity, make_valid

explain_validity(brazil_municipalities[~brazil_municipalities.is_valid].geometry)
In [ ]:
# varieties?
brazil_municipalities['validity']=[x.split('[')[0] for x in brazil_municipalities.geometry.apply(lambda x: explain_validity(x))]
brazil_municipalities['validity'].value_counts()
In [ ]:
# solving the issue:
brazil_municipalities.drop(columns=['validity'],inplace=True)

brazil_municipalities_valid=brazil_municipalities.copy()

brazil_municipalities_valid['geometry'] = [make_valid(row)  if not row.is_valid else row for row in brazil_municipalities_valid['geometry'] ]
#any invalid?
brazil_municipalities_valid[~brazil_municipalities_valid.is_valid]

The solution we got may help for some advanced techniques, but may also give us some extra trouble. Notice that once geopandas solved the problem, you have created collections:

In [ ]:
pd.Series([type(x) for x in brazil_municipalities_valid.geometry]).value_counts()

However, the pressence of collections may be troublesome in shapefiles.

In [ ]: